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Abstract 

Cloud computing provides a computing platform for the users to 
meet their demands in an efficient, cost-effective way. Virtualization 
technologies are used in the clouds to aid the efficient usage of hard- 
ware. Virtual machines (VMs) are utilized to satisfy the user needs 
and are placed on physical machines (PMs) of the cloud for effective 
usage of hardware resources and electricity in the cloud. Optimizing 
the number of PMs used helps in cutting down the power consumption 
by a substantial amount. 

In this paper, we present an optimal technique to map virtual ma- 
chines to physical machines (nodes) such that the number of required 
nodes is minimized. We provide two approaches based on linear pro- 
gramming and quadratic programming techniques that significantly 
improve over the existing theoretical bounds and efficiently solve the 
problem of virtual machine (VM) placement in data centers. 



1 Introduction 



Cloud computing is a large scale network-based distributed computing en- 
vironment where computing resources such as memory, processing power, 
bandwidth, etc. are available on demand to the users. The cloud com- 
puting environment comprises of many models such as Software as a Ser- 
vice (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service 
(IaaS). These models are made available to the users through virtualiza- 
tion techniques. The users' demands are satisfied by a set of servers hosted 
on virtual machines (VMs). The VMs utilize the resources of underlying 
physical machines (PMs) or nodes provided and operated by organizations 
called 'cloud providers'. Some examples of cloud providers include Amazon 
EC2 [1J, GoGrid [2J and Rackspace Cloud [3]. 

Adopting the use of virtual machines (VMs) in such large-scale envi- 
ronments enhances the number of available servers through multiple OS 
instances on a single node, thereby achieving efficient hardware utilization. 
However, there may be a number of underutilized nodes due to the inefficient 
mapping of virtual machines to physical machines. Minimizing the number 
of physical machines utilized helps in cutting down the power consumption 
drastically [9]. 

The placement algorithm for VMs in a data center allocates various 
resources such as memory, bandwidth, processing power, etc. from a physical 
machine (PM) to VMs such that the number of PMs used is minimized. 

This problem can be viewed as a multi-dimensional packing problem |15j 
(Figure [T]) . The resource requests of VMs are considered as d-dimensional 
vectors with non- negative entries (balls). The resource available at each 
PM is considered to be a d-dimensional vector (each dimension signifies an 
independent resource) with a magnitude of 1 along each dimension (bins). 
The goal is to minimize the number of bins such that for every bin the 
sum of the vectors placed in that bin is coordinate-wise no greater than the 
bin's vector. Thus, the resource allocation problem is an instance of the 
(i-dimensional Vector Bin Packing problem (VBP) |15| . 

For d = 1, the VBP is identical to the 1-dimensional Bin Packing prob- 
lem. 

We now define the optimization problem that we are addressing in this 
paper. 

Vector Bin Packing problem (VBP) 

Given a set S of 'n' d- dimensional vectors p\, p2, . . . , p n from [0,l] d , find a 
packing (partition) of S into A\, A2, ■ ■ ■ , A m such that Yl p eAiP k — ^> ^> ^ 
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Figure 1: Bin-packing & Vector Bin Packing along 3-dimensions 



(p denotes the projection of vector p along 'k'th dimension). The objective 
is to minimize the value of 'm', the number of partitions. 

The vector bin packing problem is a computationally hard problem and 
it is known to be NP-Hard |17j . 



2 Related work 

Vector Bin Packing (VBP) One dimensional bin packing problem has 
been studied extensively. Fernandez de la Vega and Lueker p3] gave the first 
Asymptotic Polynomial-Time Approximation Scheme (APTAS). They put 
forward a rounding technique that allowed them to reduce the problem of 
packing large items to finding an optimum packing of just a constant number 
of items (at a cost of e times the optimal solution - OPT). Their algorithm 
was later improved by Karmarkar and Karp [21] , to a (1+Zo<? 2 )-OPT bound. 

For 2-dimensional vector bin packing, Woeginger [32J proved that there 
is no APTAS. For higher dimensions, Fernandez de la Vega and Lueker [T3] 
proposed a simple (d + e)-OPT algorithm, which extends the idea of 1- 
dimensional bin packing. Chekuri and Khanna [12] showed an 0(log d)- 
approximation algorithm that runs in polynomial time for fixed d. Bansal 
et al. [6] improved this result, showing a (In d + 1 + e)-approximation 
algorithm for any e > 0. Karger et al. [20] have recently proposed a poly- 
nomial approximation scheme for randomized instances of the multidimen- 
sional vector bin packing using smoothing techniques. Patt-Shamir et al. |26j 
have recently explored the vector bin packing problem with bins of varying 
sizes and propose a (In 2d + 1 + e)-approximation algorithm for any e > 0. 
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Placement Algorithm The problem of VM placement is at the core of 
cloud computing. Several research works address the importance of placing 
VMs appropriately [TBJEIE]. Vogels [31] quotes the benefit of packing VMs 
efficiently in server consolidation. Recently, Hermenier et al. |18j developed 
a contraint programming based mechanism for dynamic consolidation. 

Several modified versions of First-Fit Decrease (FFD) have been used 
for VM placements. Verma et al. [30] propose an algorithm to pack VMs 
optimally while minimizing the number of migrations. Khanna et al. [23] 
propose a reconfiguration algorithm to cut down the wastage of physical 
resources. Hyser et al. [19] propose an iterative rearragement technique for 
improving placements in a dynamic scenario. Bobroff et al. [5] presents a 
dynamic algorithm that forecasts the resource demands and packs VMs. 
Shahabuddin et al. |28] propose a simple heuristic which aims to efficiently 
allocate resources. 

Despite the recent research trends towards virtualization, the problem 
of VM placements is vastly unexplored. To overcome this limitation, we 
propose a linear programming based approach which places VMs efficiently 
on a set of PMs. 

The rest of this paper is organized as follows. Section [3] deals with the 
formulation of the problem, Section U] provides our algorithms for vector bin 
packing (VBP) and Section [5] describes the experimental setup and results. 
Section [6] concludes the paper. 

3 Problem formulation 

We formulate the problem as an integer program in subsection 13.11 The 
integer constraints are relaxed and we formulate it's dual (in subsection 
I3.2p . The solution of the relaxed integer linear program gives a thoughtful 
insight about the optimal number of bins. 

3.1 Integer Linear Program (ILP) formulation 

The vector bin packing problem (VBP) can be formulated as an integer 
program. We use two binary variables Xij and yj. The binary variable 
indicates if vector pi is assigned to bin j and the binary variable yj indicates 
whether bin j is in use or not. Our objective is to minimize the number of 
bins used. 

The number of bins m can initially be set to a sufficiently large value 
arrived at by any heuristic (example - de la Vega and Leuker |14] give a 



4 



Notation Table 


Xij 


Fraction of vector i packed in jth bin 


Vj 


Binary variable to determine usage of bin j 


Pi 


Input vector i (VMj) 


n 


Number of vectors (VMs) 


m 


Number of bins (PMs) 


d 


Dimension of each vector 



Figure 2: Notation table for the integer linear program (ILP) formulation 

O(d)-0PT bound on the number of bins). Then, we formulate the integer 
program (ILP) as follows - 



minimize 



3 

i 

Vj _ 

Xij G {0, 1} 



J^yj s.t. 

j 

1 < i < n 

I < j <m,l < k < d 

l<i<n,l<j<m 
l<i<n,l<j<m 



(1) 
(2) 

(3) 

(4) 
(5) 



The notations are mentioned in Figure [2j The constraints of the ILP are 
as follows - 

• Constraint (2) states that every vector is packed in a bin. 

• Constraint (3) ensures that the packed vectors do not exceed the bin 
dimensions. 

• Constraint (4) tells whether a bin is used or not. 

• Constraint (5) ensures that a vector is either packed entirely in a bin 
or not. 

Constraint (5) can be relaxed as follows to obtain a linear program (LP). 

x^ > 1 < i < n, 1 < j < m (5a) 
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We can obtain a feasible solution for the LP using any standard method |13j . 
Using binary search technique, we can also find the least value of m, m € Z + 
for the relaxed ILP for which a feasible solution exists. The value of m thus 
obtained will be less than the optimal solution for the integer program i.e. 
(m < OPT). However, the solution obtained is usually not integral. To 
tackle this problem, we formulate a dual-maximization problem [29] for the 
above relaxed ILP. 

3.2 Dual-maximization problem 

We introduce several new variables to formulate the dual. The dual- 
maximization problem formulation is given in the Appendix 18. H We arrive 
at the following set of equations and constraints - 



maximize: ^ ] x ij z ij s -t- (6) 

* 3 

Xij = 1 1 < i < n (7) 

3 

^Pi-Xij < 1 1 < 3 < m, 1 < k < d (8) 

i 

J2 Zi3 < 1 1 < J < m (9) 

i 

Xij, Zij > 1 < i < n, 1 < j < m (10) 

This is a nonlinear program (NLP) as the objective function is nonlinear. 
Hereafter, we shall refer to it as NLP. 

The number of variables in the NLP can be reduced by performing the 
following substitutions for the value of Zjj's. 

Theorem 1. The optimal solution to the NLP will still be optimal when 
the value of Zij is replaced by x^/ Yli x %i- 

Proof. From the Jensen's Inequality, we have that if / is a convex func- 
tion ( "concave- up" ) on an interval / and a* G I then for weights Aj summing 
to 1 - 

n n 
i=l i=l 
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We can apply Jensen's inequality with Aj and a« corresponding to Xij and 
Zij, respectively. The modified set of equations in this case is as follows- 

m m 

fC/ J Zij) < Xijf(zij) l<i<n (a) 

3=1 3=1 
From the property of convex functions, we have - 

f(tx) < tf{x) < t < 1 (b) 

From (jaj) and fbj), we have - 

^ n m ^ n m 

f^n ^ ^ XijZij ^ ~ n ^ x ijf( z v) ( c ) 

i=l j=l i=l j=l 

Since f(x) is a convex function, any value which maximizes x also maximizes 
f(x) and vice- versa. Hence, from the inequality (jcj), we have that the term 
Xijf(zij) should be maximized for the objective function to be maximized. 
Indirectly, z^j has to be maximized relative to the values of x^. The value 
of z^ is constrained by the constraint ([9]), and hence we come up with the 
following tight function for Z{j - 

(11) 



Si x ij 

I 

Since X^/ x ij = n > the value of Yli x ij ~ n/m for an appropriately 
chosen value m. Thus, the objective function can be reduced to a quadratic 
term. 

From (jlip . the NLP now becomes - 

maximize: '^2'^2 x lj s -t- (12) 

i j 

^2xij = l l<i<n (13) 

j 

^pf.Xij < 1 l<j<m, l<k<d (14) 

j 

ajy > 1 < % < n, 1 < j < m (15) 

The optimal solution to the above modified NLP - NLP' - must be nec- 
essarily integral (follows from Theorem [1]). In this light, we now present our 
algorithms which will provide the (near-)optimal integer solution. 
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4 Algorithms and their 
complexity 

In this section, we provide two algorithms to solve the vector bin packing 
(VBP) problem. The main idea is to harness the polynomial-time solvability 
of linear and quadratic programming techniques. 

4.1 Quadratic programming 

The quadratic program NLP' can be solved using various efficient techniques 
such as interior points, active set [25], gradient techniques or through the 
extensions of simplex algorithm |25j. 

Complexity Kozlov et al. [22] presented a polynomial time algorithm for 
solving convex quadratic programs. Since our objective function is a convex 
function, NLP' can be solved in polynomial time. 

4.2 Linear programming 

The relaxed version of the integer linear program (ILP) can also be used to 
derive (near-) optimal solutions for the VBP problem. The algorithm is as 
follows - 

Algorithm 1 PackingVectors(P„, d) 
Require: A set of vectors pi,P2, ■ ■ ■ ,Pn', Pn- 
Dimension of vectors d 

1: (m,X) = SolveLP(P n ,d) 

2: if m > § then 

3: return FirstFit(P n ,d) 

4: else if m < y^j then 

5: return GreedyLP(P n ,X,d) 

6: else 

7: return IterativePack(P n ,X,d) 

8: end if 



Algorithm Q] is an iterative algorithm which packs vectors in every it- 
eration until the input is exhausted. The algorithm branches into 3 cases 
depending upon the solution returned by the relaxed integer linear program 
- LP. 
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Algorithm 2 GreedyLP(P„, X, d) 



Require: A set of vectors px,P2, ■ ■ ■ ,Pn] Pn and a set of Xij values X. 

1: X = SortDescending(X) 

2: while X ' ^ $ do 

3: Remove the top element in X 

4: if vector pj fits in bin j then 

5: Pack(i,j) 

6: Remove pi from P n 

7: end if 

8: end while 

9: Packing]/ ~ectors(P n ,d) 



Algorithm [2] is a subroutine of Algorithm Q] which packs the vectors 
greedily, given the solution set X = {xijVi,j} 

Algorithm 3 lTERATiVEPACK(P n , X, d) 

Require: A set of vectors Pi,p%, ■ ■ ■ ,p n ; P n and a set of x$j values X. 

1: Pn = Pn 

2: Z = FindDualObj(X,d) 

3: for j = 1 to m do 

4: if Y^i x ij z ij - 3 then 

5: = SortDescending(Xj) 

6: = RemoveLessThanH al f '{X ■) 

7: Pack(X'-) 

8: = P' n \PackedVectors 

9: end if 

10: end for 

11: PackingVectors(P n ,d) 



Algorithm [3] is a subroutine of Algorithm Q] which packs the bins hav- 
ing utility factor (^2i%ijZij) more than half. It ensures that the efficiently 
assigned vectors are packed into their corresponding bins. 

Complexity The subroutines of Algorithm Q] - Algorithms 121.31 - run in 
polynomial time. Solving the relaxed integer linear program can be done in 
polynomial time [27]. The First Fit heuristic also runs in polynomial time. 
Thus, Algorithm [1] runs in polynomial time. 
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Figure 3: Mean approximation ratio - PackingVectors/OPT. Mean ratio 
of about 2000 randomized trials along each dimension. For dimensions d < 
10, the mean approximation ratio stays below 1.2 

5 Experimental setup and results 

We test our 'PackingVectors' algorithm (Algorithm [Q) discussed above 
with the existing theoretical worst-case bound for the vector bin packing 
(VBP) problem. 

Tools used The Mixed Integer Linear Programming (MILP) solver 'lp-solve' 
was used to derive exact solutions for randomized input instances (20 VM 
configurations), 'lp-solve' was also used as a linear program solver in 
Algorithm [TJ 

2000 iterations of randomized test inputs were performed for each di- 
mension ranging from 2 to 10 (2 < d < 10). The number of input VMs, n, 
were about 20 in each iteration. 

Our results were compared with the exact solution of the optimal number 
of PMs (bins), and the mean approximation factor was computed. The mean 
approximation ratios are as shown in Figure [3l 

Our results were also compared with the existing bounds of approxima- 
tion given by Bansal et al. [6] and was found to have a substantial improve- 
ment as seen in Figure HI 
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Observed 
| | Theoretical Upper Bound 




Dimensions 



Figure 4: Our result vs. theoretical upper bound (In d). The bars colored 
red indicate the mean approximation ratios of our algorithm whereas the 
green bars indicate the performance of the current best algorithm [6] 

6 Conclusions and future work 

We presented two novel algorithms for placement of VMs in data centers. 
Unlike existing research based on simple First Fit heuristics, our techniques 
take advantage of the polynomial-time linear and quadratic programs, and 
provide (near-)optimal solutions to the vector bin packing (VBP) problem. 

Our experiments confirm the substantial improvement of our approach 
over the existing techniques and demonstrate that our algorithm 'PackingVectors' 
consistently yields the optimal placement across a broad spectrum of inputs. 
As part of future work, we intend to expand our techniques to handle dy- 
namic placements and continuous optimization of data centers. 
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8 Appendix 

8.1 Dual formulation of the ILP 

minimize : ^~]yj (16) 

3 

such that ~^2 x ij = 1 1 <i <n (17) 

j 

^p^.Xij < 1 l<j<m, l<k<d (18) 

i 

Uj > Xij 1 < i < n, 1 < j < m (19) 

Xij G {0, 1} 1 < i < n, 1 < j < m (20) 



Multiply constraint (fl9j) by positive multipliers Zjj corresponding to Xjj's. 
Adding all such constraints, we obtain - 

i i 
3 i 3 i 

Further, we have - 



j j i j i 

subject to z ij < 1 (21) 



Thus, the minimization problem can be reframed as a maximization problem 
with the constaint ([2Tj) and objective function being - 



max : 

« 3 



EE- 
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Adding the new constraints and relaxing constraint (|20p . the dual problem 
is as follows - 



maximize: XjjZjj 
i j 

such that Xij = 1 1 < i < n 

j 

^Pi-Xij < 1 1 < j < m,l < k < d 

i 

Zjj < 1 l<j<m 

i 

x^, Zij > 1 < i < n, 1 < j < m 



16 



